Dataset: Financial Contributions to Presidential Campaigns (Ohio State) Time: 2016 The reason to choose this dataset: Ohio is known as a swing state which could forecast the election result by the status of Ohio state. ======================================================== #### General R library and data loading
## cmte_id cand_id cand_nm
## C00575795:71194 P00003392:71194 Clinton, Hillary Rodham :71194
## C00577130:34686 P60007168:34686 Sanders, Bernard :34686
## C00580100:24166 P80001571:24166 Trump, Donald J. :24166
## C00574624:16406 P60006111:16406 Cruz, Rafael Edward 'Ted':16406
## C00573519: 7937 P60005915: 7937 Carson, Benjamin S. : 7937
## C00581876: 4824 P60003670: 4824 Kasich, John R. : 4824
## (Other) : 5262 (Other) : 5262 (Other) : 5262
## contbr_nm contbr_city contbr_st
## STOWE, JANICE : 277 COLUMBUS : 17328 OH:164475
## MISSLER, ANDREW J. MR.: 203 CINCINNATI: 15630
## BRIONES, BERTA : 179 CLEVELAND : 5778
## MOESER, DONALD : 176 DAYTON : 4634
## CUMMINGS, JOHN : 142 TOLEDO : 3287
## SCHEEL, PATRICK : 133 AKRON : 3206
## (Other) :163365 (Other) :114612
## contbr_zip contbr_employer
## Min. : 10 RETIRED :27097
## 1st Qu.:431109498 N/A :22434
## Median :440942900 SELF-EMPLOYED : 8353
## Mean :368573923 NONE : 7638
## 3rd Qu.:450131451 INFORMATION REQUESTED: 7611
## Max. :458969665 (Other) :91213
## NA's :3 NA's : 129
## contbr_occupation contb_receipt_amt contb_receipt_dt
## RETIRED :43434 Min. :-10800 11-JUL-16: 2211
## NOT EMPLOYED :10378 1st Qu.: 16 06-JUL-16: 2204
## INFORMATION REQUESTED: 7549 Median : 28 12-JUL-16: 1952
## ATTORNEY : 3320 Mean : 120 29-FEB-16: 1534
## HOMEMAKER : 3234 3rd Qu.: 80 12-AUG-16: 1529
## (Other) :96538 Max. : 29100 31-MAR-16: 1463
## NA's : 22 (Other) :153582
## receipt_desc memo_cd
## :162495 :127925
## Refund : 887 X: 36550
## REDESIGNATION FROM PRIMARY: 211
## REDESIGNATION TO GENERAL : 210
## REATTRIBUTION TO SPOUSE : 114
## REATTRIBUTION FROM SPOUSE : 112
## (Other) : 446
## memo_text form_tp
## :114599 SA17A:128232
## * EARMARKED CONTRIBUTION: SEE BELOW: 33677 SA18 : 35356
## * HILLARY VICTORY FUND : 14385 SB28A: 887
## EARMARKED FROM MAKE DC LISTEN : 282
## *BEST EFFORTS UPDATE : 246
## REDESIGNATION FROM PRIMARY : 211
## (Other) : 1075
## file_num tran_id election_tp
## Min. :1003942 A80E77D0E713E417AA88: 3 : 522
## 1st Qu.:1077664 C11887628 : 3 G2016: 56271
## Median :1096260 C10225661 : 2 P2016:107682
## Mean :1095976 C10228611 : 2
## 3rd Qu.:1119042 C10230213 : 2
## Max. :1134173 C10234145 : 2
## (Other) :164461
## party
## Length:164475
## Class :character
## Mode :character
##
##
##
##
## [1] 164475
## [1] 24
## [1] 6555
## [1] 1341
## # A tibble: 3 × 5
## party count total_amount pos percent
## <chr> <int> <dbl> <dbl> <dbl>
## 1 R 58091 11510839 5755420 0.58322342
## 2 D 71241 6744478 14883079 0.34172466
## 3 Other 35143 1481269 18995952 0.07505192
## [1] COLUMBUS COLUMBUS CINCINNATI COLUMBUS COLUMBUS COLUMBUS
## [7] CINCINNATI AKRON DAYTON COLUMBUS COLUMBUS COLUMBUS
## [13] TOLEDO COLUMBUS COLUMBUS
## 1341 Levels: BATAVIA 45320 ABERDEEN ADA ADAMS COUNTY ADDYSTON ... ZOAR
## [1] COLUMBUS COLUMBUS CINCINNATI COLUMBUS COLUMBUS COLUMBUS
## [7] CINCINNATI AKRON DAYTON COLUMBUS COLUMBUS COLUMBUS
## [13] TOLEDO COLUMBUS COLUMBUS
## 10 Levels: COLUMBUS CINCINNATI CLEVELAND DAYTON TOLEDO ... LAKEWOOD
## # A tibble: 1,341 × 2
## contbr_city total_amount
## <fctr> <dbl>
## 1 CINCINNATI 2605688.7
## 2 COLUMBUS 2226563.1
## 3 CLEVELAND 866239.9
## 4 CHAGRIN FALLS 383091.9
## 5 DUBLIN 379636.9
## 6 SHAKER HEIGHTS 376150.9
## 7 AKRON 358729.6
## 8 DAYTON 353846.1
## 9 CANTON 277801.2
## 10 WESTERVILLE 254291.7
## # ... with 1,331 more rows
## # A tibble: 10 × 2
## contbr_city total_amount
## <fctr> <dbl>
## 1 CINCINNATI 2605688.7
## 2 COLUMBUS 2226563.1
## 3 CLEVELAND 866239.9
## 4 CHAGRIN FALLS 383091.9
## 5 DUBLIN 379636.9
## 6 SHAKER HEIGHTS 376150.9
## 7 AKRON 358729.6
## 8 DAYTON 353846.1
## 9 CANTON 277801.2
## 10 WESTERVILLE 254291.7
There are 164,475 obs in the Ohio dataset with 18 original varibles. For analysis purpose, I added 3 extra varibles (party, Month_Yr and weekday)
The main features in the data set are “contb_receipt_amt” and the factors influencing the amounts. I’d like to find out which features have the most impact on raising more contributed amounts and I’d like to provide a few suggestions for candidates in the future when running a election found-raising campaign. I suspect city, occupation and day of week matter.
Since 2016 American presidential election result has came out, it would be great to do comparison analysis between contributed amount data and the final voting result data. I downloaded the voting result data for analyzing the correlation between contributed amount and the voters in Ohio. (The analysis is covered in the next section.)
Yes, I create 3 variables for further analysis. The 3 variables are listed below. 1) Party: I categorized data into 3 categories(D, R, Other) based on candidate name 2) Month_Yr: showing the contributed amount trend by month 3) weekday: analyzing if there is a huge difference between weekday and wweekend.
I enriched the Ohio dataset with Zipcode to visualize the contributed amount on Ohio map.(The analysis is conducted in multivariate plots section.)
After merging with Ohio zipcode data from Zipcode library, I found there are 83 potential wrong zipcode data so I excluded them when I was plotting the contributed amount on the map. The reason why I excluded is that it is hard to identify the correct zipcode simply based on city names.
## Source: local data frame [2,475 x 4]
## Groups: contbr_city [?]
##
## contbr_city party count total_amount
## <chr> <chr> <int> <dbl>
## 1 batavia R 1 500.00
## 2 45320 R 1 80.00
## 3 aberdeen D 5 900.00
## 4 aberdeen R 2 44.00
## 5 ada D 97 4272.00
## 6 ada Other 43 3682.88
## 7 ada R 18 1458.00
## 8 adams county R 1 80.00
## 9 addyston D 11 392.55
## 10 addyston R 3 190.00
## # ... with 2,465 more rows
## contbr_city D Other R
## 1 batavia 0.00 0.00 500
## 2 45320 0.00 0.00 80
## 3 aberdeen 900.00 0.00 44
## 4 ada 4272.00 3682.88 1458
## 5 adams county 0.00 0.00 80
## 6 addyston 392.55 0.00 190
## contbr_city amount_D amount_Other amount_R votes_D votes_R total_votes
## 1 allen 0.00 0.00 80.00 12815 29858 44636
## 2 ashland 3927.04 657.00 20966.37 5659 17169 24074
## 3 ashtabula 4441.30 4928.40 14448.85 15191 22755 39809
## 4 athens 51808.95 18469.83 12721.48 15552 10816 27941
## 5 belmont 1759.50 0.00 39326.00 8652 20729 30537
## 6 butler 768.00 1028.60 1184.60 56700 104441 168422
## total_amount
## 1 80.00
## 2 25550.41
## 3 23818.55
## 4 83000.26
## 5 41085.50
## 6 2981.20
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
I noticed that the relationship between distributed amount and the number of voters is not positively strong. It seems to have week relationship which is against my original assumption.
When dicussing the relationship between the contributed amount and the toal voters, Republican party supporters show stronger correlation than Democratic party supporters.
The correlation coefficient between contributed amount and voter numbers 1)Republican party : 0.401 2)Democratic party : 0.184
The correlation coefficient is higher than the correlation coefficient of total contributed amount and total voter numbers in Ohio (which is 0.307)
The relationship between the total contributed amount and the contributed amount of Republican party is super relative (the correlation coefficient is 0.934) because the contributed amount from Republican party supporters accounts for ~60%.
However, this is not a proper pair to check the relationship because these 2 factors are not independent.
## cand_nm contbr_city contbr_zip contb_receipt_amt party
## 1 Cruz, Rafael Edward 'Ted' LEESBURG 451359416 25.00 R
## 2 Cruz, Rafael Edward 'Ted' MINERVA 446579402 25.00 R
## 3 Clinton, Hillary Rodham COLUMBUS 432141210 40.00 D
## 4 Sanders, Bernard COLUMBUS 432022420 50.00 Other
## 5 Clinton, Hillary Rodham LEBANON 450365038 57.31 D
## 6 Sanders, Bernard CINCINNATI 45249 2.50 Other
## [1] 27392
## [1] 27309
## [1] 83
## Source : https://maps.googleapis.com/maps/api/staticmap?center=ohio+state&zoom=7&size=640x640&scale=2&maptype=roadmap&language=en-EN
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=ohio%20state
I noticed that the major cities account for more contributed amount. After visualing on the map, it shows clearly that there are a few of heat spots in Ohio.
After distinguishing the contributed amount by party, it shows that there are more funding going to Republican party and it refelects on voting result that Republican party won Ohio at the end.
No. I tried to build a linear regression model between numeric and catergorical data but it failed and it seems to involve more complexing statistical library.
The correlation coefficient between contributed amount and voter numbers 1)Republican party : 0.401 (plot 1-2) 2)Democratic party : 0.184 (plot 1-3)
The correlation coefficient is higher than the correlation coefficient of total contributed amount and total voter numbers in Ohio (which is 0.307, plot 1-1)
Based on the analysis of contributed amount by weekday, it shows that there is lower contributed amount on weekend. This might cause by the reason that people tend to leave their weekend time for family. I would suggest to set some stops in the places where people love to go with their family during weekend. It might help to increase the funding rose on weekend.
It shows that the contributed money is mainly from city area such as Columbus, Cleveland, Akron and Cincinnati etc. It helps candidates to identify the cities to plan their future campaigns for raising more funding.
I distinguish the funding for Republican party and Democratic party by color in Plot3-1. It shows that there are more funding for Republican party in Ohio and the voting result also shows that Republican party won Ohio state.
Before starting the analysis, I assumed that the contributed amount would be a strong indicator for election result. After analyzing the relationship between the election result of Ohio and the contributed amount data of Ohio. The correlation coefficient between these 2 factors are lower than I expected and it can’t be suspected as having strong correlation between contributed amount and voter numbers.
However, this is only analyzing one state. I think, for optimizing/ further analayzing, I would suggest to analyze the data of all states in the U.S. to see if there are any strong relationship between these 2 factors.
Problem while loading data: duplicate ‘row.names’ are not allowed error in R programming
Why does a boxplot in ggplot requires axis x and y? What does stat means in ggplot? ggplot2 line chart gives “geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?” Find the day of a week in R